{
 "cells": [
  {
   "cell_type": "raw",
   "id": "a47da0d0-0927-4adb-93e6-99a434f732cf",
   "metadata": {},
   "source": [
    "---\n",
    "sidebar_position: 2\n",
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f2195672-0cab-4967-ba8a-c6544635547d",
   "metadata": {},
   "source": [
    "# Expansion\n",
    "\n",
    "Information retrieval systems can be sensitive to phrasing and specific keywords. To mitigate this, one classic retrieval technique is to generate multiple paraphrased versions of a query and return results for all versions of the query. This is called **query expansion**. LLMs are a great tool for generating these alternate versions of a query.\n",
    "\n",
    "Let's take a look at how we might do query expansion for our Q&A bot over the LangChain YouTube videos, which we started in the [Quickstart](/docs/use_cases/query_analysis/quickstart)."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a4079b57-4369-49c9-b2ad-c809b5408d7e",
   "metadata": {},
   "source": [
    "## Setup\n",
    "#### Install dependencies"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e168ef5c-e54e-49a6-8552-5502854a6f01",
   "metadata": {},
   "outputs": [],
   "source": [
    "# %pip install -qU langchain langchain-openai"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "79d66a45-a05c-4d22-b011-b1cdbdfc8f9c",
   "metadata": {},
   "source": [
    "#### Set environment variables\n",
    "\n",
    "We'll use OpenAI in this example:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "40e2979e-a818-4b96-ac25-039336f94319",
   "metadata": {},
   "outputs": [],
   "source": [
    "import getpass\n",
    "import os\n",
    "\n",
    "os.environ[\"OPENAI_API_KEY\"] = getpass.getpass()\n",
    "\n",
    "# Optional, uncomment to trace runs with LangSmith. Sign up here: https://smith.langchain.com.\n",
    "# os.environ[\"LANGCHAIN_TRACING_V2\"] = \"true\"\n",
    "# os.environ[\"LANGCHAIN_API_KEY\"] = getpass.getpass()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f8b08c52-1ce9-4d8b-a779-cbe8efde51d1",
   "metadata": {},
   "source": [
    "## Query generation\n",
    "\n",
    "To make sure we get multiple paraphrasings we'll use OpenAI's function-calling API."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "0b51dd76-820d-41a4-98c8-893f6fe0d1ea",
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain_core.pydantic_v1 import BaseModel, Field\n",
    "\n",
    "\n",
    "class ParaphrasedQuery(BaseModel):\n",
    "    \"\"\"You have performed query expansion to generate a paraphrasing of a question.\"\"\"\n",
    "\n",
    "    paraphrased_query: str = Field(\n",
    "        ...,\n",
    "        description=\"A unique paraphrasing of the original question.\",\n",
    "    )"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "783c03c3-8c72-4f88-9cf4-5829ce6745d6",
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain.output_parsers import PydanticToolsParser\n",
    "from langchain_core.prompts import ChatPromptTemplate\n",
    "from langchain_openai import ChatOpenAI\n",
    "\n",
    "system = \"\"\"You are an expert at converting user questions into database queries. \\\n",
    "You have access to a database of tutorial videos about a software library for building LLM-powered applications. \\\n",
    "\n",
    "Perform query expansion. If there are multiple common ways of phrasing a user question \\\n",
    "or common synonyms for key words in the question, make sure to return multiple versions \\\n",
    "of the query with the different phrasings.\n",
    "\n",
    "If there are acronyms or words you are not familiar with, do not try to rephrase them.\n",
    "\n",
    "Return at least 3 versions of the question.\"\"\"\n",
    "prompt = ChatPromptTemplate.from_messages(\n",
    "    [\n",
    "        (\"system\", system),\n",
    "        (\"human\", \"{question}\"),\n",
    "    ]\n",
    ")\n",
    "llm = ChatOpenAI(model=\"gpt-3.5-turbo-0125\", temperature=0)\n",
    "llm_with_tools = llm.bind_tools([ParaphrasedQuery])\n",
    "query_analyzer = prompt | llm_with_tools | PydanticToolsParser(tools=[ParaphrasedQuery])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f403517a-b8e3-44ac-b0a6-02f8305635a2",
   "metadata": {},
   "source": [
    "Let's see what queries our analyzer generates for the questions we searched earlier:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "af62af17-4f90-4dbd-a8b4-dfff51f1db95",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[ParaphrasedQuery(paraphrased_query='How to utilize multi-modal models sequentially and convert the sequence into a REST API'),\n",
       " ParaphrasedQuery(paraphrased_query='Steps for using multi-modal models in a series and transforming the series into a RESTful API'),\n",
       " ParaphrasedQuery(paraphrased_query='Guide on employing multi-modal models in a chain and converting the chain into a RESTful API')]"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "query_analyzer.invoke(\n",
    "    {\n",
    "        \"question\": \"how to use multi-modal models in a chain and turn chain into a rest api\"\n",
    "    }\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "4d05b888-b0ff-4e00-abc9-98adfb1c92be",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[ParaphrasedQuery(paraphrased_query='How to stream events from LLM agent?'),\n",
       " ParaphrasedQuery(paraphrased_query='How can I receive events from LLM agent in real-time?'),\n",
       " ParaphrasedQuery(paraphrased_query='What is the process for capturing events from LLM agent?')]"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "query_analyzer.invoke({\"question\": \"stream events from llm agent\"})"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "poetry-venv-2",
   "language": "python",
   "name": "poetry-venv-2"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
